Revisiting the negative example sampling problem for predicting protein-protein interactions
نویسندگان
چکیده
MOTIVATION A number of computational methods have been proposed that predict protein-protein interactions (PPIs) based on protein sequence features. Since the number of potential non-interacting protein pairs (negative PPIs) is very high both in absolute terms and in comparison to that of interacting protein pairs (positive PPIs), computational prediction methods rely upon subsets of negative PPIs for training and validation. Hence, the need arises for subset sampling for negative PPIs. RESULTS We clarify that there are two fundamentally different types of subset sampling for negative PPIs. One is subset sampling for cross-validated testing, where one desires unbiased subsets so that predictive performance estimated with them can be safely assumed to generalize to the population level. The other is subset sampling for training, where one desires the subsets that best train predictive algorithms, even if these subsets are biased. We show that confusion between these two fundamentally different types of subset sampling led one study recently published in Bioinformatics to the erroneous conclusion that predictive algorithms based on protein sequence features are hardly better than random in predicting PPIs. Rather, both protein sequence features and the 'hubbiness' of interacting proteins contribute to effective prediction of PPIs. We provide guidance for appropriate use of random versus balanced sampling. AVAILABILITY The datasets used for this study are available at http://www.marcottelab.org/PPINegativeDataSampling. CONTACT [email protected]; [email protected]. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
Revisiting Beta 2 Glycoprotein I, the Major Autoantigen in the Antiphospholipid Syndrome
Beta 2 glycoprotein I (β2GPI) is a single chain 50 kDa highly glycosylated glycoprotein at an approximate concentration of 4 μM in cells. The abundance of this protein in plasma and its high state of preservation indicate the important role of this protein in mammalian. In addition, β2GPI has a particular structure in the fifth domain, and is categorized as the major antigen recognized by autoa...
متن کاملDiscovering Domains Mediating Protein Interactions
Background: Protein-protein interactions do not provide any direct information regarding the domains within the proteins that mediate the interactions. The majority of proteins are multi domain proteins and the interaction between them is often defined by the pairs of their domains. Most of the former studies focus only on interacting domain pairs. However they do not consider the in...
متن کاملThe value of serum level of S100B protein in predicting brain edema in children with diabetes ketoacidosis
Background and Objective: The S100B protein has recently been considered as an important marker for predicting severe brain damage; however, there has been very little evidence of increasing this marker in cerebral edema due to metabolic disorders such as diabetes ketoacidosis (DKA). This study was designed and performed to evaluate the prognostic role of S100B protein in predicting brain edema...
متن کاملPredicting Protein-Protein Interactions from Multimodal Biological Data Sources via Nonnegative Matrix Tri-Factorization
Protein interactions are central to all the biological processes and structural scaffolds in living organisms, because they orchestrate a number of cellular processes such as metabolic pathways and immunological recognition. Several high-throughput methods, for example, yeast two-hybrid system and mass spectrometry method, can help determine protein interactions, which, however, suffer from hig...
متن کاملRabies Infection: An Overview of Lyssavirus-Host Protein Interactions
Viruses are obligatory intracellular parasites that use cell proteins to take the control of the cell functions in order to accomplish their life cycle. Studying the viral-host interactions would increase our knowledge of the viral biology and mechanisms of pathogenesis. Studies on pathogenesis mechanisms of lyssaviruses, which are the causative agents of rabies, have revealed some important ho...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 27 21 شماره
صفحات -
تاریخ انتشار 2011